Before you read any further, there are NO free links on Search Engine Watch. I just made that up to get your attention. Well… kinda. Ok, so maybe there ARE free links… but only if you’re an SEO wannabe. I’m only using SEW as an example, because (1) it’s the first example I could find, and (2) I’m hoping that someone over there will read this, fix their site, and then show their gratitude by buying me a house. Ok…maybe not a house. Maybe a nice car? Ok…maybe not a nice car. Maybe a new bike? A sandwich?
Anyway… sweet prizes aside, I have to ask you all to fight the temptation to immediately run over to SEW and start spamming them for links. Trust me…it’s a waste of time. All you would get out of it is the satisfaction of knowing that I will never enjoy my free sandwich. The point of this blog post is not to promote black hat techniques. I’m only going to show you one to make you all aware of how easy it is to jeopardize your site’s rankings without even knowing it. Reading this information will be kinda like when DEA agents have to learn how to make drugs: They’re not allowed to actually make drugs–they just need to know what to look for when they’re raiding houses ‘n’ shit. If they ever DID make drugs, then chances are the chief would not be earning a free sandwich anytime soon. You get what I’m saying? No? Oh well…no one does. Let’s keep moving.
In the recent post by Stephen T about XSS security, he offers some good tips on how to prevent XSS security vulnerabilities. I made a comment on that post that contained some potentially valuable information, but instead of maximizing that value, I opted for my usual commenting style: Be weird and confuse people. After some consideration, I think maybe it’s important enough to put in a separate blog post (this one, to be exact), and expand on it a bit. So the following sections are going to be a repeat of my comment… but with a few [EDIT] comments added in for increased value/clarity. Here we go…
@Blackhats…
When exploiting an XSS vulnerability for inbound links, don’t just embed a single link–surround it with relevant content too. This could increase the value (relevance) that you gain from the inbound link, but more importantly, it allows you to submit multiple pages of embedded links to the search engines without wondering if they’re unique enough to be indexed. (This is assuming that the site you’re exploiting hasn’t limited the number of characters allowed in the input field.) Just be mindful of the URL length your content creates, since search engines are less likely to index a URL that contains entire paragraphs.
[EDIT: This isn’t sound advice. I was merely trying to sound cool and possibly acquire some YOUmoz street cred by posing as a black hat. The truth is…I don’t even have a hat. I spend so much time THINKING about SEO that I never get around to actually DOING SEO. In any case, the alleged “tips” mentioned above are actually pretty stupid. It’s like telling a DEA agent that the best way to cook meth is with a Bunsen burner instead of a propane torch! LMAO! Am I right or am I right? Come on…who’s with me? No one? Moving on.]
@Whitehats…
One of the most-common sources of XSS vulnerabilities is through a search feature. If your site has a search feature, try this simple test to see if it’s vulnerable:
- Search for a gibberish query, e.g., [asdfgseaneatsbabieshjkl]
- Look at the results page your site returns.
- Look to see if it says something like this: “Your search for ‘asdfgseaneatsbabieshjkl’ returned 0 results.”
- If it does return something like that, then try another search…only this time type in the HTML code that would create a link, e.g., [baby eater]
- Look to see if it says something like this: “Your search for ‘baby eater’ returned 0 results.”
- If it does, then look at the URL of the results page you are on. Chances are it is a unique URL that ends in something like “/search.asp?q=” followed by whatever your search query was. This is what a spammer submits to search engines!
[EDIT: Admittedly, this simple test is perhaps overly simple. Just because your site doesn’t respond this way, doesn’t mean your site is secure. The point I was trying to make is that input fields, like those that are used for site searches, are a common target for spammers to “inject” spamalicious code.]
Tips:
If you find out that your site has a vulnerable search feature, check to see if anyone has taken advantage of it. Go to GYM and do a site search (and/or an inurl search) for your search results page, e.g., “site:domain.com/search.asp” or “inurl:domain.com/search.asp”
If these searches produce pages with embedded links, then you’ve been exploited by a spammer. Luckily, it’s obvious who is responsible, since the spammer most likely linked to their own site. Check to see who registered the domain. If the spammer is dumb enough to have registered the domain under his real address, go to that address and punch the spammer in the eye. Otherwise, disallow the results page URL in your robots.txt and use the available URL removal tools provided by the search engines.
[EDIT: Don’t actually punch any spammers’ eyes. That’s not how we treat people. You don’t just punch someone’s eye when you don’t like what they link. Again… I was probably just trying to build YOUmoz street cred. Anyway… see below for a clear example using Search Engine Watch’s search feature. I spent like a hundred years Photoshopping these screenshots, just so you won’t have an excuse to go bother SEW.]
For the sake of this learning exercise, let’s pretend for just a second that we are NOT honest, ethical, SEO experts. Let’s pretend that we are a spammer, and we are trying to acquire a free inbound link from Search Engine Watch. First, we find a place to inject code. It is pretty easy to spot:
Then we “do a search.” We type in the HTML code for a link to our page, using keyword-rich anchor text:
After hitting enter or clicking the SEARCH button, we are taken to the Search Engine Watch results page, where we analyze the success/failure of our spam attempt. In this case, it first appears as though our little trick worked, because we see this on the page:
The words “baby eater” are linked, but the link doesn’t go to our page… it goes to a non-existent URL that returns Search Engine Watch’s 404 page. Another thing we notice on the results page is that the search box now looks like this:
To figure out what’s going on here, we view the page’s source code. Doing a Ctrl + F for “baby eater” reveals the code we injected, but it has been modified by the server. Before the server created the results page with our query string, it added a slash before the quotation marks, as seen here:
So now we try again, but this time we search for the HTML code without quotation marks in it. Using markup code that is non-compliant with the W3C standards is indeed an unconscionable act, but hopefully the benefits of our new inbound link will offset the penalty (or penalties) that Google assigns to anyone affiliated with pages that don’t pass XHTML Strict validation. Our second attempt produces a winning result. The link works properly, and the search field is back to normal:
Now all we need to do is take the unique URL of the results page and submit it to Google! Unfortunately, our search didn’t produce a unique URL.
Again, we go to the source code. This time we Ctrl + F for the form that creates the search box. Once we find it, we look at the input field that receives search queries, to find out what the name attribute is:
Then we add that name as a parameter to our results page URL, and set the parameter value equal to the search query (the one from our second try). Like this:
Start with the results page URL:
http://searchenginewatch.com/showPage.html?page=sew_search_results
Add the name from the input field:
&q=
Add the search query from our 2nd try:
This gives us our unique URL:
http://searchenginewatch.com/showPage.html?page=sew_search_results&q=baby eater
Paste the URL in your browser (which should convert the illegal characters into hexadecimal notation… I think), and you should see the SEW page with our link on it!
Now that we know what a spammed URL would look like, let’s stop pretending we’re spammers and start pretending we’re the SEO who optimizes/webmasters the Search Engine Watch website. Now we want to use our knowledge of the spammy URL format to see if anyone has abused our XSS vulnerability. We go to Google and search for this:
site:searchenginewatch.com/showPage.html inurl:sew_search_results
It should return 2 results. Click the link that says “repeat the search with the omitted results included” and it returns the full 61 results. This may seem like we’ve been overrun with spammers, but after analyzing the URLs that Google has listed, we realize that most of these bring up a SEW page that actually has results on it. In other words, our spam trick only worked because Search Engine Watch doesn’t have content about baby eaters. If we created a unique URL that set q equal to “reputation management” (which is what most of the 61 URLs contain), then it would simply return all the SEW content pages that include that phrase. This may create a duplicate content issue, but it doesn’t mean we were spammed.
Out of all the 61 results, one particular result in Google stands out:
This URL appears to contain an IP address, and it also contains the characters %3E, which is the hexadecimal notation for the closing bracket of HTML tags (i.e., “>”). So, right away we should be suspicious. Sure enough, clicking on this Google result brings up a SEW page that contains this:
The search query that produced this is:
Buy levitra online order levitra on line order no prescription levitra without
I can’t tell what this guy was trying to accomplish (assuming it wasn’t a bot), but it didn’t work (the iframe content is the Search Engine Watch 404 page). I desperately tried figuring out what the intended iframe src was supposed to be (not because it would’ve added anything to this blog post, but because I was trying to score some cheap Levitra online without a prescription), but I couldn’t find any IP address variations that produced an actual site. But that’s not the point. The point is this very well COULD have been a crawlable link, instead of a broken iframed tease. If someone is going through the trouble to steal links like this, chances are their site isn’t exactly chillin’ in a safe neighborhood, you feel me? No? Damn it.
What I mean is, if you are ignoring XSS security, then you could be putting your rankings at risk. The example above clearly shows the potential for a spammer to put links on your site. When Google sees that you’re linking to bad neighborhoods, you’re gonna get hit with a ranking penalty–one that makes the W3C non-compliance penalty feel like acquiring relevant backlinks from Wikipedia.gov.edu.mil. ROTFLMFAO! You GOTTA like that one, eh? No?
*sigh*
I need a sandwich.